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Executive  Summary 


The  vast  amount  of  geospatial  data  now  available  covering  the  entire  world  presents  new 
and  exciting  opportunities  to  derive  new  information  through  information  fusion.  These 
data  sources  include  mapping  services  (Google  Maps,  Yahoo  Maps,  etc.),  Web  2.0  based 
collaborative  projects  (WikiMapia  and  OpenStreetMap),  traditional  geospatial  data 
sources  (raster  maps,  KML  vector  layers),  and  non-traditional  geospatial  data  sources 
(phone  books,  property  records,  etc.).  This  large  amount  of  diverse  data  increases  the 
probability  of  encountering  missing  or  inconsistent  data  and  requires  efficient  reasoning 
algorithms  to  scale  to  large  problem  instances  during  information  fusion.  To  address 
these  issues,  we  have  developed  a  geospatial  fusion  framework  that  integrates  the  various 
types  of  geospatial  data  available  within  a  region.  Our  approach  builds  on  our  past  work 
on  constraint  satisfaction  reasoning  and  data  access.  This  framework  supports  the  ability 
to  gather  and  fuse  information,  and  uses  conflict  resolution  strategies  to  disambiguate 
data  inconsistencies.  We  implemented  our  approach  into  a  system  called  InfoFuse  and 
successfully  demonstrate  this  approach  on  the  real-world  data  for  Belgrade. 

Objectives  of  the  Research  Effort: 

In  a  previous  AFSOR  funded  project,  we  presented  a  constraint  satisfaction  approach  to 
identifying  the  buildings  shown  in  a  satellite  image  by  fusing  imagery,  road  vector  data, 
and  online  telephone  books.  The  resulting  fused  information  can  then  be  used  to  augment 
and  update  a  geospatial  database  such  as  a  gazetteer.  This  previous  work  demonstrated 
that  combining  traditional  and  non-traditional  data  is  a  means  to  deriving  information 
from  multiple  sources  that  is  not  available  in  any  single  source.  The  work  also 
highlighted  the  benefits  of  fusing  diverse  data  sources.  In  general,  the  vast  amount  of  data 
now  available  covering  the  entire  world  provides  new  and  exciting  opportunities  to  derive 
new  information  through  information  fusion. 

However,  there  are  several  key  challenges  that  need  to  be  addressed  in  order  to  fully 
realize  the  benefits  of  fusing  these  diverse  types  of  sources.  These  data  sources  include 
mapping  services  (Google  Maps,  Yahoo  Maps,  etc.),  Web  2.0  based  collaborative 
projects  (WikiMapia  and  OpenStreetMap),  traditional  geospatial  data  sources  (raster 
maps,  KML  vector  layers,  gazetteers),  and  non-traditional  geospatial  data  sources  (phone 
books,  property  records,  etc.).  The  large  amount  of  data  that  can  be  exploited  also 
increases  the  probability  of  encountering  source  failures  or  data  inconsistencies. 
Therefore,  it  is  imperative  that  any  systems  that  deal  with  real-world  data  sources  have 
the  ability  to  deal  with  these  potential  issues.  By  introducing  more  data,  we  are  also 
presented  with  a  scaling  problem.  Any  reasoning  algorithms  and  conflict  resolution 
strategies  need  to  scale  to  larger  problem  instances  and  support  larger  numbers  of  data 
sources. 


Accomplishments/New  Findings 


We  developed  a  general  fusion  framework  that  integrates  the  various  types  of  geospatial 
data  available  within  a  region.  Our  approach  builds  on  our  past  work  on  constraint 
satisfaction  reasoning  and  data  access.  This  framework  supports  the  ability  to  gather  and 
fuse  information,  using  conflict  resolution  strategies  to  disambiguate  data  inconsistencies. 
This  framework  supports  both  the  integration  and  reasoning  of  heterogeneous  geospatial 
data.  The  data  integration  tasks  involve  gathering  the  available  geospatial  data  from  a 
wide  variety  of  sources,  such  as  those  listed  above.  The  geospatial  reasoning  processes 
can  infer  new  and  useful  knowledge  about  a  region  by  applying  various  reasoning 
methods  over  the  integrated  data.  An  example  of  geospatial  reasoning  process  is 
identifying  streets  and  street  names  from  raster  maps.  Figure  1  shows  an  example 
screenshot  where  a  variety  of  data  sources  and  reasoning  capabilities  have  been 
integrated  into  a  single  integrated  framework.  In  this  figure,  the  fusion  of  the  datasets 
and  reasoning  processing  make  it  possible  to  identify  the  locations  of  the  buildings,  the 
names  of  the  streets,  and  the  businesses  associated  with  each  of  the  buildings. 


Figure  1:  Area  in  Belgrade  before  and  after  the  geospatial  fusion  process 


The  integrated  framework  provides  a  common  platform  for  geospatial  data  integration 
and  reasoning  tasks.  It  allows  the  user  to  interactively  fuse  different  kinds  of  geospatial 
data  sources  and  exploit  the  integrated  data  to  carry  out  various  geospatial  reasoning 
processes.  We  have  also  developed  constraint  satisfaction  techniques  that  enable  the 
framework  to  automatically  infer  constraint  models  from  problem  instance  data  and 
improve  problem-solving  performance.  We  now  describe  the  accomplishments  in  detail. 


Inferring  Constraint  Models  from  Problem  Instance  Data 

We  have  shown  that  Constraint  Programming  (CP)  is  an  effective  paradigm  for  modeling 
and  solving  the  building  identification  problem.  However,  the  modeling  of  this  problem 
remains  an  art,  requiring  a  CP  expert  to  specify  the  variables,  their  domains,  and  the  set 
of  constraints  that  govern  a  particular  Constraint  Satisfaction  Problem  (CSP).  Further 
complicating  the  modeling  process  is  the  need  to  specialize  a  given  constraint  model  for 
all  cities  throughout  the  world  exhibiting  some  addressing  variations.  To  automate  the 
modeling  process  and  alleviate  the  load  placed  on  the  human  user,  we  developed  a 
framework  to  enrich  the  generic  constraint  model  by  adding  to  it  the  addressing 
constraints  that  apply  to  a  given  problem  instance  (Michalowski  et  al.  2007a).  These 
additional  constraints  are  inferred  from  the  input  data  of  a  problem  instance. 

The  embedded  information  that  we  exploit  is  a  set  of  instantiated  variables  (i.e.,  variable- 
value  pairs)  which  we  call  landmark  data  points  (i.e.  buildings  with  known  addresses). 
Our  framework  tests  the  features  of  these  data  points  in  order  to  select,  from  a  library  of 
constraints,  those  addressing  constraints  that  should  be  added  to  the  generic  constraint 
model  of  the  problem.  The  creation,  storage,  and  maintenance  of  individual  constraint 
models,  for  all  cities,  that  account  for  all  of  the  applicable  addressing  constraints  is  an 
unrealistic  and  formidable  endeavor.  However,  the  work  required  of  the  expert  to  define 
constraints  that  capture  all  of  the  characteristics  of  addressing  seen  to  date  is  easier  and 
more  manageable.  Moreover,  combining  this  expert  knowledge  with  known  building 
addresses  provided  by  public  sources  such  as  gazetteers  allows  our  framework  to 
dynamically  build  a  constraint  model  of  an  area  of  interest.  This  constraint  model  plays  a 
vital  role  in  determining  the  precision  of  the  returned  solutions. 

Improving  Problem-Solving  Performance 

The  benefits  of  an  accurate  model  are  only  fully  realized  when  the  solving  mechanism 
takes  advantage  of  the  structure  and  characteristics  of  a  problem  instance.  To  generate  a 
precise  solution,  the  solving  component  must  be  flexible  in  supporting  varying  problem 
models.  To  improve  the  performance  of  problem  solving,  the  solver  should  exploit  the 
structure  of  the  problem  and  incorporate  appropriate  heuristics  to  reduce  the  explored 
search  space.  Therefore,  we  extended  the  solver  we  developed  under  a  previous  AFSOR 
grant  to  support  the  constraint  models  we  infer.  By  developing  a  standardized 
representation  for  all  problem  instances,  which  includes  the  inferred  constraint  model,  we 
developed  an  end-to-end  building  identification  application  that  can  identify  buildings  in 
areas  larger  then  previously  possible  (Michalowski  et  al.  2007b).  This  application's 
architecture  is  shown  in  Figure  2.  Our  empirical  evaluations  show  that  the  solution 
quality  and  runtime  performance  is  greatly  improved  when  using  such  an  end-to-end 
system  when  compared  to  our  previous  approach. 


Instance 

data 


Constraint 

Inference 


1/ 


Inferred 
Constraints 


Model 

Generation 


Problem 

Model 


_ /  V _ / 


N.  Constraint 
1/  Solver 

k _ J 


Figure  2.  Building  Identification  Application  Architecture 


Framework  for  Geospatial  Data  Integration  and  Reasoning 

Our  goal  is  to  develop  a  framework  for  integrating  and  reasoning  about  geospatial  data. 
The  various  geospatial  layers  are  integrated  on  top  of  a  base  layer,  such  as  the  satellite 
imagery  for  a  given  area.  The  system  imports  other  data  into  the  system  and  converts 
them  into  a  uniform  KML  format.  The  reasoning  processes  then  operate  on  the  data 
layers  that  are  available  and  either  generate  new  layers  or  associates  the  results  of  the 
reasoning  with  the  existing  layers.  This  uniform  approach  to  representing  and  reasoning 
about  the  data  hides  the  heterogeneity  present  in  the  input  data  formats  from  the 
geospatial  reasoning  processes,  thus  allowing  them  to  focus  on  the  logic.  This 
heterogeneity  has  been  a  major  hurdle  for  achieving  semantic  interoperability  of 
geospatial  data  sources.  The  reasoning  methods  are  able  to  exploit  the  integrated  data  and 
present  the  results  on  a  map  or  image  using  this  framework. 

Figure  3  shows  the  interface  for  InfoFuse,  which  is  implemented  on  top  of  GoogleMaps. 
The  right  column  shows  the  various  operations  available  to  import  data  and  reason  about 
the  existing  data.  The  system  operates  entirely  on  real-world  data  for  the  city  of  Belgrade. 
This  figure  shows  the  streets  and  buildings  for  a  given  region  in  Belgrade.  At  this  point 
in  the  processing,  the  system  has  imported  the  data  for  each  of  the  streets  shown  from  the 
white  pages  and  yellow  pages  for  Belgrade.  Thus,  it  simply  has  a  list  of  the  residents  and 
business  that  are  on  a  given  street.  The  next  task  is  to  apply  an  information  reasoning 
process  to  determine  which  address  is  associated  with  which  building.  In  order  to 
determine  how  to  map  the  telephone  book  data  to  the  individual  buildings,  the  system 
turns  the  problem  into  a  constraint  satisfaction  problem  (CSP)  (Bayer  et  al.,  2007; 
Michalowski  &  Knoblock,  2005].  The  CSP  formulation  (Figure  3)  integrates  the  vector 
data  that  defines  the  layout  of  the  streets  in  a  city,  the  building  locations  along  the  street, 
the  addresses  obtained  from  online  phonebooks,  and  the  addressing  patterns  used  in  the 
given  city. 
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Figure  3:  InfoFuse:  A  system  for  integrating  and  reasoning  about  geospatial  data. 


The  reasoning  task  consists  of  various  integration  and  geospatial  reasoning  steps.  In 
InfoFuse,  we  focused  on  a  specific  integration  task  to  solve  the  real-world  problem  of 
identifying  buildings  in  imagery.  The  main  steps  involved  in  this  task  are  identifying 
streets  in  an  image,  identifying  building  locations,  identifying  building  addresses  and 
linking  business  data.  InfoFuse  provides  several  alternative  operations  through  the 
interface  to  carry  out  these  tasks.  The  user  can  identify  streets  by  automatically  loading 
the  OpenStreetMap  data  using  a  software  wrapper,  import  an  existing  road  vector  layer  or 
interactively  drawing  the  road  line.  Figure  4  shows  an  example  of  streets  identification 
for  a  selected  area. 
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Figure  4:  Streets  identified  (green  lines)  with  OpenStreetMap  data. 


The  user  can  identify  the  building  locations  using  similar  operations  available  in 
InfoFuse.  For  example,  it  can  be  done  by  manually  drawing  points  or  polygons  (Figure  5) 
to  represent  the  buildings,  loading  in  an  existing  KML  layers  for  the  building  locations, 
or  extracting  data  from  another  source,  such  as  WikiMapia. 


STEP  2:  IDENTIFY  STREETS 

Q  Draw  Street  Manually  woww 
0  Label  Sirem  Mutually 
U  Import  KML  File 
0  View  Map*  OnUoe 

Load  OpctiStrcctMap*  Data 

STEP  3  IDENTIFY  BUILDING 
LOCATIONS 

o  Draw  Point* 

@  Draw  Polygon 
a  Import  KML  File 
Process  ILDAR  Data 
Import  WikiMapia  Layer 


STEP  4  IDENTIFY  BUILDING 
ADDRESSES 

MLo*J  Pbonc  book  DaU 


Figure  5:  Building  locations  manually  identified  as  polygons. 


InfoFuse  gathers  current  information  about  people  and  businesses  for  a  region  by 
executing  the  wrappers  over  Yellow  Pages  and  White  Pages  website.  InfoFuse  then  links 
the  extracted  data  with  the  road  vector  data  and  makes  it  available  for  viewing.  Figure  6 
shows  the  businesses  listing  and  phonebook  data  in  the  popup  for  the  street  Uskocka  of 
Belgrade  City.  The  CSP  reasoner  combines  the  road  vector  data,  the  building  location 
data,  and  the  phone  book  data  in  a  reasoning  process  to  map  the  addresses  to  the 
individual  buildings.  This  reasoning  processs  takes  the  phone  book  data  associated  with 
the  roads  vectors,  performs  the  reasoning  over  data,  and  links  the  resulting  data  to  the 
individual  buildings.  [Bayer  et  al.,  2007;  Michalowski  &  Knoblock,  2005].  Instances  of 
building  variables  that  are  mapped  to  a  single  address  are  depicted  with  green  placemarks 
and  instances  mapped  to  multiple  addresses  are  depicted  with  red  placemarks.  The 
ambiguity  of  multiple  possible  addresses  mapped  to  a  single  location  is  due  to  the 
uncertainty  that  may  be  present  in  the  input  data,  such  as  missing  addresses  in  the 
phonebook. 
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Figure  6:  InfoFuse  displays  the  resulting  mapping. 
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